Text Editor Interfaces for Semantic Editors

نویسنده

  • Rodney M. Bates
چکیده

We describe a data structure which enables a semantic programming language editor to present a conventional text editing interface to users, yet can be viewed as abstract syntax, while working on incomplete and/or incorrect programs. It reduces the chronically high storage consumption of tree representations. It maintains non-syntactic formatting information, e.g. comments and blank lines. It also is capable of representing arbitrary unparsable or unparsed text. It internally represents syntactically illegal text in syntax-corrected form, but nonetheless displays what the user typed. An editor providing semantic editing functions but presenting the user with a text editing interface is the motivation for the work. Introduction and motivation Over a decade ago, there was widespread research activity in syntax-directed editors for programming languages. This activity has shrunk greatly, in large part because such editors present to their user, an interface which is very hard to use. They require that the program code being edited be continuously syntactically correct. Unfortunately, any path to a user’s editing goal whose intermediate points all satisfy syntactic correctness is usually substantially longer than necessary. Conventional text editors, e.g. [10], long available, allow users to accomplish their editing goals far more quickly.. Although syntax-directed editing was rejected by users, it has great unrealized potential for better user assistance in other ways. Since it maintains some form of syntactically analyzed code, it can be the basis for many language-aware browsing and editing functions. The most powerful editing and browsing functions involve static semantics. The simplest example is identifier browsing, where a command takes the user from a use of an identifier to its declaration or to its other uses. A more sophisticated example would be stepping through all the actual parameter expressions, supplied in various calls, to a particular formal parameter. Transformations such as replacing non-local references to a variable by a parameter are also possible. Type inference can be used to infer partial or complete declarations, from uses. Several programmers’ tools have been developed which do at least simple forms of semantic browsing. These also have not realized the potential of semantic editing, because they only work on bodies of code which are complete and statically correct, or which have been so at one time and have seen only small changes since. This assumption is, of course, unrealistic, as most programming work is done on code which is far from complete and correct. Contemporaneous with syntax-directed editor work, editors doing semantic analysis were also developed [2][17][13]. These provide incremental static semantic analysis and work on incomplete and incorrect programs, even at the semantic level. The PSG system [2] can, for example, infer or partially infer declarations, types, etc., from uses of identifiers. Two of these semantic editors, however, also have at least partially syntaxdirected front ends and thus suffer to a degree from the same user interface problems as the syntax-directed editors. Internal representations of programs in parsed form are chronically space-hungry. It is not uncommon for parsed tree-representations to consume an order of magnitude more memory than corresponding text files, even without semantic information. With increasing interest being shown in whole-program analysis, optimization, etc., it is significant that working sets for these representations can easily exceed even the large, cheap memories found on today’s computers. So, the motive for this work is the development of a semantic editor which can offer the kinds of semantic functions we have described, work on incomplete and incorrect programs, and simultaneously present a true text-editor style user interface. Such a tool needs an internal syntactic representation of the edited documents. The work we report here is a prerequisite for such an editor, while presenting the user with an acceptable, conventional text editing interface. It also reduces internal memory requirements. As customary, the central representation and algorithms are languageindependent. A language definition is expressed in a declarative notation (for which there is an editor) and converted mechanically to the tables and data structures necessary to specialize to an editor for a specific language. The language-independent algorithms use an abstract interface to obtain this language-specific data. The remainder of this paper is organized as follows. Section 1 describes the relation of this work to other language-aware editors. Section 2 formally describes the internal data structure that our scheme uses and that is central to our integration between textual editing and internal tree representation. This is presented in several subsections describing an integrated system of abstract and concrete grammar symbols, the tree-like representation of source code itself, the method of reconstructing a textual representation, special treatment of syntactic lists, determination of layout of the textual form, methods of representing non-syntactic information and syntactically incorrect information inside the tree representation, how the representation can be viewed as an abstract syntax tree, and the method used achieve good asymptotic performance. Section 3 gives an informal, descriptive account of some of the principal algorithms for manipulating this representation. Finally, section 4 summarizes the work. 1. Relation to Other Work There are many editors with varying degrees of internal knowledge of a programming language. For a representative sampling, see[8][9][11][15][16][17][18][6]. The capabilities these have are strongly influenced by the internal representation they use. The template editors generally maintain a textual representation, like a general purpose text editor, but they can insert predefined blocks of text called templates. Each template is a skeleton for a programming language construct, e.g., a variable declaration. In general, the templates contain character sequences that denote nonterminals in the language’s grammar, called placeholders or insertion points. Once a template has been edited or expanded, its syntactic structure becomes indistinguishable from other text. Thus templates provide syntactic awareness for initial code composition but not for subsequent modification. The syntax directed editors (abbreviated SDE) maintain some form of tree representation of parsed source code. They can provide template editing, but further allow modifications to be done using knowledge of the syntactic structure. Editing operations all have meanings that correspond to tree-oriented operations such as pruning, grafting, replacing a nonterminal by a template, etc. Gandalf [8]is an example of an SDE. Since these are the only editing operations provided, an SDE guarantees syntactic correctness (though not syntactic completeness) at all times, which is one of its attractions. Another advantage of an SDE is that the tree representation supports additional programminglanguage-specific functions, e.g., static semantic analysis. It is this benefit that motivates this work. Waters [22] argued long ago that syntax-directed editing commands alone are inadequate to support productive use by programmers. Generally, experienced programmers have found the pure SDE-style user interface to be much too constraining and to require far more effort to accomplish many editing tasks than a traditional, text-editor interface. Several editors have provided hybrids of various kinds between text editing and syntax-directed editing. The Cornell Program Synthesizer [17] uses syntax-directed editing for certain higher-level constructs, while using text editing for lower level constructs such as expressions. Morris and Schwartz [15] describe an editor which does not require constant syntactic correctness everywhere, but only for the prefix of source code leading to the editing site. Brun et. al. [7] describe an editor that works much like a text editor, but is oriented to lexical tokens, rather than characters. The PSG system [2] maintains a tree representation, but allows that any subtree, at user request, be unparsed, i.e. converted to textual form, after which text-editing may be done within. Reparsing of this region is done later, at user request. The syntactic consequences of text editing cannot propagate out of the region. Our scheme does not have this restriction, nor does it require the user to explicitly specify a region to be unparsed. In [16], Reiss describes the PECAN system as allowing both template and text-oriented editing, but does not describe the way these are integrated and does not describe a representation or algorithms for doing so. The extensive recent work on the Harmonia system [6] is by far the closest to this work. Most of our goals are also stated goals and addressed in some way by Harmonia. For example, Harmonia’s representation treats syntactic lists specially and represents them as balanced trees, as do we in a different form, allowing logarithmic time access to a list member. It has a method, different from ours, of including non-syntactic material [21]. Harmonia has not emphasized compactness of the internal representation. Our representation is different and reflects our concern about size. Not enough is known yet about space requirements of either Harmonia or our representation to give an accurate space comparison. However, [6] reports that the Harmonia representation uses an average of 15 nodes per line of source code. With implementation still incomplete, we have a preliminary measurement of 5.9 nodes per line, with blank lines and whole-line comments neither counted nor represented. Our nodes will undoubtedly be bigger that those of the Harmonia representation, even after attention has been given to low level field packing. 2. The TST Representation In this section, we formally define our data structure, which we call a text syntax tree (TST). We show how it can be viewed in two distinct ways: as a plain text document, or as a conventional abstract syntax tree for a structured document, and how it supports arbitrary text editing and the inclusion of non-syntactic material. We explain the relationship between the TST, the abstract syntax, and the text it represents. We also describe its method of handling syntactically incorrect text. 2.1. An Integrated System of Grammar Symbols We assume the programming language is, as usual, defined in part by a lexical grammar (abbreviated LG) and a context free grammar (abbreviated CFG). We start with a finite alphabet , which, unusually, is an alphabet of characters. The set denotes the strings over . The LG defines several subsets of and maps each into a token. The tokens are terminals of the CFG. For a given language, we define several finite sets of grammar symbols, sufficient for an abstract grammar as well as a CFG, as follows: a set of fixed tokens a set of variable tokens the tokens a set of abstract list symbols a set of abstract fixed symbols the abstract symbols a set of concrete syntax nonterminals the nonterminals The tokens are terminals in the underlying grammar. For each , there is exactly one lexeme which is the spelling of in a program. Members of are called fixed tokens for this reason. They correspond to the keywords, operator symbols, etc. of the programming language. For each , there is a set often infinite, of lexemes. Members of are called variable tokens and correspond to the identifiers and literals of the programming language. All of these sets are dependent on a particular programming language. However, contains a few unique elements which are present for any language, do not appear in the CFG, and which are used to represent characters not described by the language syntax. These symbols are members of the subset "!# $ . They will be described presently. We also make the common assumption that, for a given language, there is a tree grammar called the abstract grammar (abbreviated AG). It is of mostly conventional form. Rules of the AG define the shape of tree nodes, giving each a symbol and a list of children. The AG defines a set of abstract syntax trees (abbreviated ASTs). The symbols in interior nodes of an AST are members of . For convenience in defining the relationship between the textual notation for a program and its AST, we have members of serve as nonterminals in the CFG as well. A CFG production with left-hand-side % will, when parsed, cause the building of an AST node containing symbol % . Since abstract nodes normally correspond to productions rather than nonterminals, this requires that the CFG be written so every nonterminal that is in has exactly one production. Rewriting a CFG to satisfy this rule is straightforward, although it may require the introduction of new nonterminals that are not in and that have multiple productions. Such nonterminals will be members of . 2.2. Formal Description of TSTs The following, mutually recursive, set definitions denote the parts of a TST.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Efficient Semantic Analysis for Text Editors

Meddle is a programmer’s text editor designed to provide as-you-type semantic information to the user. This is accomplished by using algorithms for tracking changes to the editor’s text buffer, incremental scanning and incremental parsing. These algorithms are presented and ex-

متن کامل

Contributions to the Construction of Extensible Semantic Editors

This dissertation addresses the need for easier construction and extension of language tools. Specifically, the construction and extension of so-called semantic editors is considered, that is, editors providing semantic services for code comprehension and manipulation. Editors like these are typically found in state-of-the-art development environments, where they have been developed by hand. Th...

متن کامل

Program Editing in a Software Development Environment ( DRAFT ) Steven

This paper describes an approach to program editing that is suitable for a modern software development environment. Program editing involves editing whole programs and not just files. Our approach uses the data integration facilities of the Desert environment to combine information about the overall system with that of the file being edited. It is built on top of the commercial word processing ...

متن کامل

PLATΩ: A Mediator between Text-Editors and Proof Assistance Systems

We present a generic mediator, called PlatΩ, between text-editors and proof assistants. PlatΩ aims at integrated support for the development, publication, formalization, and verification of mathematical documents in a natural way as possible: The user authors his mathematical documents with a scientific WYSIWYG text-editor in the informal language he is used to, that is a mixture of natural lan...

متن کامل

Integrating Proof Assistants as Reasoning and Verification Tools into a Scientific WYSIWYG Editor

A major problem for the acceptance of mathematical proof assistance systems in mathematical practise is the shortcomings of their user interfaces. Often the interfaces are developed bottom-up starting from the mathematical proof assistance system. Therefore they usually focus on the individual system and its proof development paradigm and neglect traditional forms to communicate proofs as used ...

متن کامل

XML Based Graphical User Interface Editor and Runtime Parser for ISO 11783 Machine Automation Systems

Graphical user interface design is a very visual process which requires graphical tools. Modern integrated development environments have text editors for writing code and graphical user interface editors for designing the user interface. In ISO 11783 systems this distinction between the program logic and the user interface elements is even more pronounced as the program is executed by the elect...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2002